Rank by Readability: Document Weighting for Information Retrieval

نویسندگان

  • Neil Newbold
  • Harry McLaughlin
  • Lee Gillam
چکیده

In this paper, we present a new approach to ranking that considers the reading ability (and motivation) of the user. Web pages can be, increasingly, badly written with unfamiliar words, poor use of syntax, ambiguous phrases and so on. Readability research suggests that experts and motivated readers may overcome confusingly written text, but nevertheless find it an irritation. We investigate using readability to re-rank web pages. We take an extended view of readability that considers the reading level of retrieved web pages using techniques that consider both textual and cognitive factors. Readability of a selection of query results is examined, and a re-ranking on readability is compared to the original ranking. Results to date suggest that considering a view of readability for each reader may increase the probability of relevance to a particular user.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating the Impact of Authors’ Rank in Bibliographic Networks on Expertise Retrieval

Background and Aim: this research investigates the impact of authors’ rank in Bibliographic networks on document-centered model of Expertise Retrieval. Its purpose is to find out what kind of authors’ ranking in bibliographic networks can improve the performance of document-centered model.   Methodology: Current research is an experimental one. To operationalize research goals, a new test colle...

متن کامل

N-gram Fragment Sequence Based Unsupervised Domain-Specific Document Readability

Traditional general readability methods tend to underperform in domain-specific document retrieval because they fail to effectively differentiate the reading difficulty of the individual domain-specific terms and the semantic associations between the textual units in a document. On the other hand, recently proposed domain-specific readability methods have relied upon an external knowledge base ...

متن کامل

Integrating Understandability in the Evaluation of Consumer Health Search Engines

In this paper we propose a method that integrates the notion of understandability, as a factor of document relevance, into the evaluation of information retrieval systems for consumer health search. We consider the gain-discount evaluation framework (RBP, nDCG, ERR) and propose two understandability-based variants (uRBP) of rank biased precision, characterised by an estimation of understandabil...

متن کامل

Ontology based Similarity Measure in Document Ranking

This paper presents a methodology for the ontology based semantic annotation of web pages with annotation weighting scheme that takes advantage of the different relevance of structured document fields. The retrieval model is based on the importance factors of the structural elements, which are used to re-rank the documents retrieval by the ontology based distance measure. The relevance concept ...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010